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Ur. Friedman* 

1* This paper used to be CONFIDENTIAL and 
registered in its previous edition. We sent a 
letter to the Navy in April requesting this 
paper, and we still have no answers this is in- 
dicative of the state of chaos existing in their 
training section. 

2. There are two footnotes (pp. 7, 12) that 
refer to Gaines' "Elementary Cryptanalysis”. This 
is a sad commentary on the Navy's training resources 
in that they make their only references to a book 
available on the public market* this exaggerates 
the worth and importance of the Gaines book, and 
minimizes the resources of the Navy. Certainly 

a passing, mention could have been made of the Army's 
tests on cryptanalysis, especially since the Navy a 
has no 'texts of its pwn. . . « " 

r i> mo A d J « 4^15 , 

3. Other than the poor terminology employed, 
and the plethora of mathematical eyewash that 
makes a simple subject difficult, this paper is 
potentially very good, after substantial editing 
and liberal re-writing. 



- Capt 6. 
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THE INDEX OF COINCIDENCE 



FOREWORD 



The subject of this pamphlet is coincidence. 

The student may well ask, "What is coincidence and what applications has it?" 

"Coincidence" as the term is used here may be defined as a recurrence of a let- 
ter in the same place, or in a corresponding place, as when two texts are lined up 
one under the other, letter for letter. 

This mathematical evaluation assists the cryptanalyst first in preparing his 
material for attack, and later in the actual attack itself. It assists specifically 
in answering the following questions. 

1) . How much like random, or how different from random, is this text? 

2) How similar are these texts? 

3) How significant is this variation from random? 

4) How significant is this similarity? 
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I SIMPLE MONOGRAPHIC COINCIDENCE 

The test of coincidence is the evaluation of the coincidences of letters, or of 
digraphs, etc., between two or more messages, or within the same message. 

The coincidence or "pairing" test may be consolidated into one final number or 
. "statistic". That statistic is called the "index of coincidence" and is defined as 
the ratio of the actual coincidences to the coincidences to be expected from chance 
(coincidences in random text). For English text the expected I.C. is 1.75. For 
most European languages the expected I.C. is about 2.00. For random text the expect- 
ed I.C. is 1.00. 



Assume two pages of cipher text based on a complex cipher which will give a 
"flat" frequency table for the entire message. Select a letter at random (say the 
3rd) from one page and another from the other page (say the 3rd also). 

Tnere is 1 chance in 26 of the first letter’s being an "A"- 

There is 1 chance in 26 of the second letter's being an "A" 

There is 1^ chance in 676 ,of both letter's being "A" 

There is also i chance in 676 of both letter's being "B" 

Therefore, the chances of both letters being the same letter (in a chance selection 
of cipher text) ‘are: 

26 chances in 676, or 1 chance in 26, or 3.8462%. < 

If we select many pairs of cipher letters, the average number of identical 

letters to be expected "in the long run" will be 3.846% (or 1/26) of the total 

number of possible coincidences: We call this number the "Expected Coincidence due 

to Chance” (random text). - 

\ 

With English text it is different. Take two pages of English text. Make a 
chance selection from each page. 

There are about 130 chances in 1.000 of the first letter's being an "E" 

Tnere are about 130 chances in 1 , 000 of the second letter's being an "E" 

There are about 16 , 900 chances in 1 ,000,000 of both letters' being an "E" 
Likewise, there are 8.464 chances in 1,0007000 of both being "T" 

6,400 chances in 1,000,000 of both being "N", etc. 

(See table following) . 
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Table 



Text Letter 

(Telegraphic 

Text) 


Chances in 
1,000 of 1st 
letter's be- 
-ing this ltr. 


Chances in 
1,000 of 2nd 
letter's be- 
ing this ltr. 


Chances in 
1,000,000 of both 
letters' being 
this letter 


E 


130 


130 


16,900 


0 


75 


75 


' 5,625 


A 


74 


74 


5,476 


I 


73 


73 


5,329 


N 


80 


80 


6,100 


R 


76 


76 


5,776 


S 


61 


61 


3,729 


T 


92 


92 


8,464 


D 


42 


42 


1,764 


H 


34 


34 


1,156 


L 


36 


36 


1,296 


C 


31 


31 


961 


M 


25 


25 


625 


U 


26 


26 


676 


P 


27 


27 


729 


F 


28 


28 


784 


G 


16 


16 


256 


Y 


19 


19 


361 


B 


10 


10 


100 


V 


15 


15 


225 


W 


16 


16 


256 


K 


4 


4 


16 


i J 


2 


2 


4 


Q 


2 


2 


4 


X 


5 


5 


25 


z 


1 


1 


i 


Any letter 


1,000 


1,000 


66,930 



Finally there are 66,930 chances iri 1,000,000 (the sum of the chances for the 
individual letters) of both letters* being the same plain text letter in a chance 
selection. Therefore, if we select many pairs of plain text letters, the average 
number of identical letters to be expected "in the long run" will be G.693% 

(about 1/15) of the total number of Possible Coincidences. 

We may call this number the' Expected Coincidences in English Text 

In actual practice we are concerned with the coincidences between our two texts, 
or within our alphabet, etc. The tally or count of these coincidences we call the 
Actual Coincidences. 

To permit comparisons between results obtained from texts of varying amounts, 
it is most convenient to convert to an index number . We call this the Index of 
Coincidence and use the abbreviation I.C. or T. 

I 

By definition I s Actual Coincidences 

Expected Coincidences due to Chance. 

The expected I.C. for English (or mono-alphabetical cipher text) is: 

■ 1.75, approximately. 



The actual I.C. of unknown cipher text may take almost any value but in practice the 
range will generally extend from about .80 to about 2.00 (simple monographic index 
of coincidence). 

The value of the index of coincidence for a given English text will depend on 
the actual distribution of letters in that text. Repetitions in short texts will 
increase the index of coincidence. Unrelated text (that is, text with few repeti- 
tions) will give an I.C. approaching the theoretical 1.75. As the expected number 
of chance coincidences is based on a flat frequency (where each cipher letter is 
ultimately used the same number of times) any cipher text that differs radically 
from such frequency distribution will have a correspondingly higher I.C. This is 
especially noticeable in short cipher texts where the frequency table has not had 
an opportunity to "flatten out". 
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The mono-graphic I.C. of English naval text will increase with small amounts 
of text to 1.80 - 2.00 (as compared with the theoretical 1.75) and small amounts 
of random text will give l.C.'s of 1.10 - 1.20 (as compared with the theoretical 
1.00). The amount of excess attributable to the sample size will be discussed later, 
under "standard deviation". 

For most European languages the expected I.C. is higher than in English, due to 
the more irregular letter distribution of their normal alphabets, namely: 

' Language I.C. V 

Random text 1.0 

English 1.7 

Italian 1.9 • 

Spanish 2.0 

French 2.0 

German 2,0 



In addition to the simple monographic index of coincidence (/ ), there are 
occasions when the digraphic index of coincidence (<?), trigraphic I.C. (ij), 
tetragraphic I.C. (<*), pentagraphic I.C. (ij), etc., can be used to advantage. 
They are derived from the normal digraphic (trigraphic, etc) frequency tables in 
the manner indicated in paragraphs 3 to 7. 



Expected values 


for these simple 


polygraphic 


indices of 


coincidence 


are as follows: 


Language 


t 


l 2 


<3 


l 4 


l 5 


Random text 


1.00 ‘ 


1.00 


1.00 


1.00 


1.00 


Englisn 


1.75 • 


4.75 


'27.89* 


X 


X 


Italian 


1.92 


5.68 


X 


X 


X 


Spanish 


2.02 


6.29 


X 


X 


X 


French 


2.02 


6.29 


X 


X 


X 


German 


1.98 


6.57 


X 


X 


X 



Notes : X = Not computed, 

. , * = /Computed from the only known trigraphic table. \ - 

^The correct index might vary widely from this estimatej 

In practice the actual polygraphic l.C.’s will usually run higher than their 
theoretical values, and a repeated word or two in short texts will made them sky 
rocket. As typical examples, we have taken the plain text of four problems in the 
elementary and secondary courses and computed the various l.C.’s (from : , to i 5 , 
that is the monographic, digraphic, trigraphic, tetragraphic and pentagraphic 
indices of coincidence). 



Expected random 
Expected plain 


1.00 

1.75 


1.00 

4.75 


1.00 

27.89 


1.00 

? 


1.00 

? 


Problem No. 1 


1.80 


5.23 


29.11 


427. 


7240. 


Problem No. 2 


2.00 


7.73 


66.04’ 


1062. 


14900. 


Problem No. 3 


1.91 


5.60 


42.04 


666. 


12070. 


Problem No. 4 


1.74 


4.90 


31.70 


456. 


9190. 



Suppose we have a language for which we know the overall proportions of the 
letters are Pi, P2, - - P c . 



Suppose further that we have two pieces of text from this language and line 
them up one above the other, and then count coincident letters. What is the ex- 
pected number? 



- 4 - 
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At a particular place the probability of a coincidence involving the i th 
letter is Pi2. There fore, the C cases being mutually exclusive, the probability 
o'f an incidence is 



Pi ■ ■ (2 > 

If the length of overlap is N , then the expected number of incidences is 



N 

i - . 

If the text is such that p L - pj - , we will refer to it as "flat", 

or "random". The probability of an incidence is , and the 

expected number is N 1. 

C 

The ratio of the number found in a comparison to that expected is called the 
"index of coincidence", i 



l - g / - Cg / 

1 fyc' ■ 



(3) 



The expected value Y of i for our language is given by taking the expected 



value of g= N 



over the expected value for flat text or 



y - 



N ZTpl 

% 



-- C 



P-. 



( 4 ) 



Notice that the expected value of the I.C. for flat text is 1. 



IV PRACTICAL APPLICATIONS 




(A) TO DETERMINE WHETHER TWO MESSAGES ARE IN THE SAME KEY 

During U.S, Fleet Problem V (1925) the Battle Fleet used a cipher of their own 
design, A total of 13 messages in this cipher were submitted to the Code and Signal 
Section for attack. Although a different indicator was used in each case, it was 
suspected that some of the messages might be in the same key. Two messages in one 
key (example No. 1) and two more in another key (example No. 2) were discovered. 

(The messages were eventually solved). 




Each message was "lined up" with each other message and the coincidences were 



noted. (See examples 


No. 


1 


and 


No. 


2) 


1 . 
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U 
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P 


N 


s 


V 


s 


0 


P 


N 


F 


D 


N 


G 


R 


0 


A 


A 


0 


R 


G_ 


J!_ 


_z 


E 


Z 


G 


J 


F 


R 
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0 
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11 
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w 


V 
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C 
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F 
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(J 


G 
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I. 
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D 
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Y 


Y 


K 


H 


H 


K 


C 


I .u 


Q 


P 


Y 


V 


0 


P 


J 


J 


F 


R 


B 


G 


X 


Y 


F 


B 


D 


S 


L 


J 


0 


C 


N 


V 


V 


S 


L 


J 


0 


D 


S 


0 


0 


L 


p 


R 


0 


c 


G 


S 


P 


U 


A 


Z 


B 


N 


O 


P 


O 


J 


N 


Y 


Z 


V 


T 


Z 


L 


S 


K 


R 


A 


J 


0 


P 


F 


Y 


F 


R 


X 


N 


D 


G 


E 


C 


D 


B 


0 


C 


V 


D 


K 


Q 


B 


S 


P 


E 


L 


T 


R 


N 


V 


I 


U 








_ 


_ 






_ 
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Q 


Q 


M 


N 


L 


Q 


V 


V 


A 


G 


P 


T 


Y 


C 


G 


C 


P 


N 


X 


J Q 


U 


E 


R 


D Q 


W 


W 


Q 



QQUIIVHBKQ 

Coincident letters are underscored. 12 coincidences in 140 pairs of letters. - 



/ 



- 5 - 
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.Simple monographic coincidence. 

Expected -N - 140 s Where N = number of units examined. 

Coincidences C 26 C s number of cells for single letter 

examination z 26. 



IC a 12.0 s 2.2 (Messages are in same key). 



There is one repeated trigraph, GPZ, in the messages under examination. This 
coincidence indicates that the keys correspond at that point, but does not necess- 
arily indicate that the keys correspond throughout the message. To prove coinci- 
dence of the keys throughout the 'two messages, we must have our coincidences spread 
through the messages in question. (As they were in the above example). Likewise, 
dlgraphic and trigraphic coincidences may be compared and evaluated to an index of 
coincidence. 

For example, in the above messages, 2 coincident digraphs were found (GP and PZ) 
(also one coincident trigraph). In this message there were 139 digraphs and 138 
trigraphs in alignment with possibilities of coincidence, 

/ 

_N - 139 - .206 
676 ” 676 " 

digraphs were to be expected from chance. Two were found, giving an IC - 2 - 9.70. 

.206 

This value, far above the normal 4.75 index of coincidence, does* not necessarily 
indicate the messages are in the same key. All we really know is that the two keys 
are identical in the second group. The extremely high I.C., 9.70, is due entirely 
to the small amount of text involved in this example. As the amount of text 
decreases, the variation of the I.C, from the expected will become more pronounced, 
until at times it is possible that small amounts of text may give entirely false 
indications. This effect will be discussed more fully under "standard deviation”. 



Example No. 1 2 
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D 
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X 


Y 


A 


0 
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K 


- 


- 


- 


- 


- 


- 




- 


- 


- 




- 


- 


- 


- 


- 


- 


- 


• 


• 



14 coincidences in 220 pairs of letters. 

& ■ 220 s 8.46 coincidences expected. 

C 26 

1C s 14 a 1.66 (almost normal for English;. 

8.46 

These two messages probably are in the same key (and actually proved to be). Note 
that there are no repeated digraphs or trigraphs. Note also that coincidences are 
well spread out. 






6 
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(B) TO DETERMINE WHETHER TWO MESSAGES "PVERLAP" IN THE SAME RUNNING KEY 

Copy each message on a single line, omitting all spacing. "Line up" the 
messages and note coincidences. Then shift one message one place to the right and 
note the coincidences. Repeat this process to the end. 

If the index at any point is 1.75 or higher, for mono-graphic examination, the 
position and the fact of the "overlap" is probable. In this application the digraphic 
and trigraphic indices are useful adjuncts to the monographic index. 

For some purposes the fundamental unit may be taken to be a set of letters, as 
digraphs, trigraphs, etc. Suppose we are interested in digraphic coincidences. Then 
the digraphic I.C., le , will be calculated as above, noting that-the size C of the 
alphabet is larger this time. 



V RELATIONS AMONG THESE STATISTICS 



The monographic and digraphic I.C.'s are not independent. For if the 
probabilities of the various letters are Pj, P 2 , - - -, P c , then the probability 7ij 
of th eij ^ 1 digraph is pi Pj ignoring the cohesion of the language, and for 
the moment treating it like newspaper which had been cut into little pieces, one 
letter to a piece, and then shuffled and arranged in a line. Using this 
of 7 L j we get the I.C. 

T ^ V - £ n ‘ ,/ 

- c r '^ p, ‘ - (0 ^ Pi'- y) 

It is true however that language has cohesion, and that each letter 
probability of occurrence of others in its vicinity. Usually then 1 * 
of the estimate l 2 above. We will sometimes calculate the ratio 

/ lg -A^and call it the digraphic "index of cohesion". (6) 



est image 



(5) 



affects the 
is in excess 



Estimates can be made in the same way for higher I.C.s. One can show that 



7 



/ - 



L i-l 



or 



~ l L-j L j 



(7) 



In these equations the right members are thought of as quantities already com- 
puted,- while the'J'son the left are estimates or predictions of quantities which can 
be computed from the definition (4). 

An application of these relations occurs in the study of fractionating* systems, 
where as a preliminary to enciphering the text is expressed as a product of two com- 
ponents, and each component is enciphered separately, and then the cipher text is re- 
combined to ordinary letters. For instance, each of 25 letters may be represented by 
a two digit number, where the first digit comes from 0, 1, 2, 3, 4, and the second 
from 5, 6, 7, 8, 9, The argument for these proceeds as for digraphs, and the I.C. of 
the combined text is the product of those of the fractions. However, here any signi- 
ficant deviation from this estimate is not adequately described as cohesion, but must 
be due to the dependence of the, two fraction streams. 

VI THE ROUGHNESS OF A SINGLE SAMPLE 

We have introduced the I.C. as a measure of the match between two pieces of text. 
We can extend this ide t a now to a measure of the roughness "of a single sample. Suppose 
we have a piece of text which we duplicate on two slips of paper and then place them 
one under the other for the purpose of counting coincidences. There will be one po- 
sition of total coincidence, which we will rule out. If we compute the I.C. for all 
other positions, we will have what we call the "index of coincidence of a single sam- 
ple . 

♦Consult Gaines "Elementary Cryptanalysis," Chapter XXII 






- 7 - 
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If there are M letters in our sample, we will have looked in N-1/2M(M-1) dis- 
tinct places. 

If the text is 'flat we would expect N . M(M-l) coincidences. 

C--2C- (9) 



If the text is not flat but have proportions P^, Pg, , P c of letters, then 

there are fj- P t M occurences' of the i th letter.. In the course of our counting we will 
compare every letter with every other, so that the . ith letter will give rise to 1/2 
f t (f ; -1) coincidences, or 

'4 f t (fi-n 

/ in all. 



( 10 ) 

( 11 ) 



Comparing this with the expected ’in flat text we get the I.C. 

d = ± 's 2 h (h~n 

/ 

>/ 2 C M (M-l) 

-- c f L (f r n 
M 



( 12 ) 



(13) 



In theoretical context" (12) is more useful than (13), but (13) is a little simpler for 
computational purposes. 



Notice that this formula is different from (4). 
of the perfect hit 
M-l by M, so that 



This is because of the omission 



If M is large enough then f^ -1 can be replaced by f i r P i M and 
C 



6 -- 






M * 



- c 



-- 7 



is an asymptotic expression for 6 
Notice that C > V > / 



(14) 



We see that 

6 



M - ! 
M 



7 



C_ 

M 



or 



6 (M-l ) = 7 M ~ C 



(15) 



6 -- 



7 M-c 
M-l 



or ? ' i - CM ^ LLts ■ 
M 



(16) 



The error in using 7 (usually more convenient) in place of 
7 IS 



~ s - c - 7 




c - <7 


7 * 6 ' M-! 




M 



( 17 ) 



This error is always positive, that is, 7 is an over-estimate of <7 . The 
is smaller for larger values of 7 , or larger values of M. 



error 
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Notice that T' is a measure of the shape of the distribution only, and is inde- 
pendent of the sample size, as is 



V -- 






(is) 



= inco i £ - P^M. 



But 6 does depend on the sample size M. which is a desirable characteristic, 
since random roughness is usually present in small samples. For smaller samples 

S 7 (19) 

M- / 

/ 

is seen to be smaller, thus automatically compensating to some degree for small sam- 
ple errors. We will usually measure the roughness of single samples by d , using "y 
as an asymptotic approximation. 



VII EXAMINATION CF CIPHER ALPHABETS AND CIPHER TEXTS 



The indices of coincidences discussed m the previous paragraphs may be used in 
analyzing the internal structure of a cipher alphabet. A message of 173 letters has 



a troqi.c 
ABC 


>nc\ table as given 
1) t F C H I 


belov : 
J K 


L 


\1 X 0 P Q R S 


T U 


V 


V X - Y 


Z 




5 !i (T 


14 2 2 IP 22 S 


4 


8 


13 


1 0 14 IP 0 m 


13 0 


7 


19 1 2 


1 




/ 

Wo make 


•i tolly count . i .c . 


, 


the 


number of times a letter 


occurs 


with the 


same 


f re- 


q'.icncy , 


thus • 






















Number 


oi 














■ 




T.i 1 1 y 


Ta 1 1 le 








f (f-1) 








n (x) 


f (f-: 


f 


n 








2 




/• 






g 


0 


4 








P 










0 


l- 


3 








0 






1 




0 


2 


•1 








1 










4 


3 


1 








3 










rt 


■1 . 


1 








6 










6 


6 


2 








15 










• 30 


7 


' 1 








< 21 










21 


8' 


1 








■28 










29 


10 


2 








, 4 5 










90 


13 


3 








78 










234 


1-1 


2 








91 










182 


19 


1 








171 










171 


22 


1 








231 










231 



Coincidences 1000 

V.o find 111 it there are 1000 coincidences. Although we can not count the coincidences 
in the examination of .< single cipher text, we can evaluate the various frequency 
counts into actual coincidences, having the actual coincidences from the table 
(1000). no obtain the l.C. by formula (1.3) ■ _ , 



6 -- 



Subst ilut l n 



6 -- 



lone x 52 



M (M - I ) 



- l 



173 x 172 

The index of coincidence indicates that a monou lptiabe t ic substitution was employed. 
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As a second example to show the results obtained from small texts, we calculate 
as follows from a frequency count of 36 letters assumed to be monoalphabetic . 



A 3 
B 

C 5 

D 

E 

F 3 
G 1 
H 1 
I 1 
J 
K 

L 3 
M 



N 2 
0 1 
P 

Q 

R 1 
S 3 
T 1 
U 2 

V 

W 1 
X 4 

Y 4 
Z 



Tally or 
Cells 

(f) n l/2f (f-1) 



T f C 3 



T f c 4 T 




IC 36 x 26 x 2 
36 x 35 



1.49. 



The alphabet in question was actually a monoalphabetic substitution. With a small 
amount of text, the simple index is somewhat indeterminate-. Using the 'triple and quad- 
ruple indices, the results are even more so, and at times may give even. false indica- 
tions. It is again emphasized that sufficient text must be used to give positive in- 
dications . 



As another elementary example of the application of the index of coincidence to 
the internal examination of a cipher text, we have, for example, a 5-letter repetition 
at an interval of 85. Is the cipher a polyalphabet cipher of 5 or 17 alphabets? By 
means of internal examination with the index of coincidence we can determine what type 
of cipher we have. Make a frequency count of the cipher alphabets assuming 5 and then 
17 alphabets. 



Calculate the index of coincidence in each case for one. or more alphabets. The indices 
of higher value will indicate which assumption is correct. If neither assumption shows 
positive results (an index aro'und 1.7) we may have a progressive cipher, running key 
cipher, auto key cipher, or cipher of even more complex nature. 

Another elementary application is as follows. We have a cipher message which has 
been intercepted. The I.C. is computed and found to be ^-1.79. 



This is so rough as to resemble plain text. A simple substitution has the property of 
leaving <f unchanged, and so has a transposition. Multi-alphabet substitutions lower 
the I.C.. So we are reduced to three hypotheses, that our sample is either transposed 
plain text, a simple substitution, or both substitution and transposition. 



The digraphic I.C* is computed, dg=4. 85. Remember that its expected value is 
3.fl5 - d e in view of the known roughness. Therefore the index of coherence is K t - 



4.85. T _ 
3 ^ 5' 156 



Since a transposition destroys coherence we can assert that no transposition is in- 
volved. Multigraphic Indices of coincidence are preserved by a simple substitution. 
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VIII THE STANDARD DEVIATION 

We have already several times referred to the fact that these statistics are useful 
only if the sample is large enough. To get an idea as to whether this is the case or 
not we measure our results in terms of a standard deviation, "sigma". One standard 
deviation is roughly one half the width of a band which when placed about the average 
will include two-thirds of the data. It is a measure of the dispersion. If sigma is 
large the data is spread out wide, and if it is small the numbers are close together. 
In a binomial distribution the standard deviation is O'- V/j \ fq where N IS the 
number of observations and p and Q are the probabilities of success and failure. 

To estimate the significance of (S , we refer to (12), where the denominator is 
the expected number of incidences in flat text, and the numerator is the number found. 
Assuming a binomial distribution of the incidences we find the variance £7~^-Npq - 
MCM-1) (C-l) . 

2C2 

If s is the "sigmage" or deviation of the number found divided by CT we have 



fj. (fL-li-MMr.ll 

yM(M- !)(C- /) 

2C* 



cj>7 fjffj-D-M ( M-/J 
\/2(C-l)(M){M-h 



fjJAdl 

M (M-D 






c - / 

/v (u-n 



V 2 



d - / 

7- ~ r ~ 



( 20 ) 



( 21 ) 



( 22 ) 




6 - 1 

~V£(C-I) 



/ Vm(m-T) 



6-t 



V2 (c~n 



M. 



(23) 




For M>51 error is less than 1%. 

Notice that the sigmage is a linear function of the sample size M, and also 
linear with the "bulge" (f - / . The denominator is relatively unimportant to the es- 
timation of 5 except in shifting from code to cipher, when C can change from 500,000 
to as small as 10. 

The bulge d- 1 is' a quantity which will recur frequently. 



Formula (23) does not apply to the iota I.C. For that we have the expected number 
g and % found. Then <f* - N /c ( / - '/c ) and the sigmage is 



g- % _*% - t 

Vv c (!-'/c) ’ 



l - 1 



VH 



(24) 



In this case the sigmage is linear with the bulge l-l , but varies only as the 
square root of the sample size. 

The significance of s is given m the following table, which lists the pro- 
bability of getting s or a larger result from chance. 
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s 


prob . 


s 


prob . 


.1 


.4602 


2.9 


.0019 


.2 


.4207 


3.0 


. .00135 - 1/800 


.3 


.3821 


3.2 


.00069 


.4 


.3446 


3.4 


.00034 - 1/3000 


.5 


.3085 


3.6 


.00016 


.6 


.2743 


3.8 


.00007 


.7 


.2420 


4.0 


.00003 f 1/33.000 


.8 


.2119 


4.101 


.0000206 


.9 


.1841 


4.200 


.0000134 


1.0 


.1587 


4.299 


.0000086 


1.1 , 


.1357 


4.398 


. 0000055 


1.2 


.1151 


4.497 


.0000035 f 1/300,000 


1.3 


.0968 


4.596 


.0000022 


1.4 


.0808 


4.695 


.0000014 = 1/711,000 


1.5 


.0668 


4.794 


.0000008 1/1,250,000 


1.6 


.0548 


4.907 


.00000046164 


1.7 


.0446 


5.006 


.00000027741 


1.8 


.0359 


5.105 


.00000016513 


1.9 


.0287 


5.209 


. 0*0000009736 


2.0 


.0228 


5.303 


.00000005686 ? 1/18 million 


2.1 


.0179 


5.402 


.00000003290 -1/30 million 


2.2 


.0139 


5.501 


.00000001850 


2.3 


.0107 


5.600 


.00000001070 


2.4 


.0082 


5.798 


.00000000335 


2.5 


.0062 


6.080 


. 00000000060 


2.6 


.0047 


6.503 


. 00000000004 


2.7 


.0035 


6.785 


.00000000001 .- 1/100,000 million 


2.8 


.0026 









EXAMPLE 

Suppose an unknown cipher in four digit groups is being investigated and the question 
is whether it is reenciphered* or not. If it is the text can be expected to resemble 
random more than if it is not. We make a frequency count on the 10,000 groups and then 
determine how likely such a distribution is by chance. If it is not likely we must 
seek an explanation. 

If 560 groups are counted and 76 then (f - 2.41 This 

gives a sigmage S= 5 6 . The table shows that we would have to repeat this procedure 
about 120 million times on random material to get a like result. We can say that this 
result is unlikely by chance and that an explanation is called for. The obvious one 
is that there is no reencipherment, or that it is very feeble. Tests to check this 
further hypothesis can be quickly devised. 

♦See Gaines "Elementary Cryptanalysis" page 2. < 
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IX TO DETERMINE WHETHER TWO ALPHABETS ARE IDENTICAL ALPHABETS 

Assume that a complex cipher using secondary alphabets has been analyzed and re- 
duced to 50^ alphabets . There are only 26 possible secondary alphabets, so some of 
these 50 alphabets can be combined. Visual inspection is too inaccurate to be trust- 
ed, except within an abnormally large amount of text. 

Four sample alphabets are given in the- table following, A and B. 

Table "A" — Frequency Tables 



No. 1 No. 2 No. 3 No_. 4 













— 








— 










A 








A 








A 


2 






A 


2 


B 


2 






B 








B 


2 






B 




C 


1 






C 








C 


1 






C 


1 


D 








D 


1 






D 








D 


1 


E 


3 






E 


3 






E 


2 






E 




F 


1 






F 








F 








F 


1 


G 








G 








G 


2 






G 




H 


2 






H 








H 








H 




I 


2 






I 








1 


3 






I 


2 


J 


1 






J 


1 






J 








J 




K 








K 


2 






K 






* 


K 




L 








L 


1 






L 








L 




M 


2 


• 




M 








. M 


1 






M 




N 


3 






N 


2 






N 


3 






N 


1 


0 








0 








0 








0 


2 


P 








P 


1 






P 








P 


1 


Q 








Q 


1 






Q 








Q 




R 








R 


1 






R 








R 




S 








S 








S 








S 


3 


T 








T 


1 






T 








T 




U 








U 








U 








U 




V 






1 


V 


2 






V 








V 




V 








V 








Y 


1 






Y 


3 


X 








X 


2 






X 








X 


2 


Y 


2 






Y 


2 






Y 


1 






Y 


1 


Z 


1 






Z 








Z 


_2 






Z_ 






20 








20 






20 








20 


Table ”1 


- 


- Repeated Letters 






















No 


eg 

i 

H 


zl No 


. 1-&-3 


No 


. 1-&-4 


No 


.2-&-3 


No.2-&-4 


No 


. 3-&-4 






A 




A 




A 




A 




A 




A 


4 






B 




B 


4 


B 




B 




B 




B 








C 




C 


1 


C 


1 


C 




C 




C 


1 






D 




D 




D 




D 




D 


1 


D 








E 


9 


E 


6 


E 




E 


6 


E 




E 








F 




F 




F 


1 


F 




F 




F 








G 




G 




G 




G 




G 




G 








H 




H 




H 




H 




H 




H 








I 




I 


6 


I 


4 * 


I 




I 




I 


6 






J 




J 




J 




J 




J 




J 








K 




K 




K 




K 




K 




K 








L 




L 




L 




L 




L 




. L 








M 




M 


2 


M 




M 




M 




M 








N 


6 


N 


9 


N 


3 


N 


6 


N 


2 


N 


3 






0 




0 




0 




0 




0 




0 




' 




P 




P 




P 




. P 




P 


1 


P 








Q 




Q 




Q 




Q 




Q 




Q 








R 




R 




R 




p 




R 




R 








S 




S 




S 




s 




S 




S 








T 




T 




T 




T 




T 




T 








U 




U 




U 




U 




U 




U 








V 




V 




V 




V 




V 




V 








V 




V 




W 




w 




ir 




w 


3 






X 




X 




X 




X 




X 


4 


X 








Y 


4 


Y 


2 


Y 


2 


Y 


.2 


Y 


2 


Y 


1 








19 


Total 


32 
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Line up two alphabets at a time and cross multiply the repetitions for each let- 
ter (see table "B"). "E" occurs 3 times in No. 1 alphabet and 3 times in No. 2. 

There are 9 pairs of "E's" in No. 1 and No. 2. Add the coincidences noted for a pair 
of alphabets. 

There are 20 letters in each' alphabet which gives 400 pairs of letters to deal 
with. Chance would give 400/26, or 15.4 repeated letters in two alphabets. The in- 
dex of coincidence is the sum of the actual coincidences divided by 15.4. 

The indices are as follows: s 

No. 1-&-2 1.23 

No. 1-&-3 2.08 (above the normal 1.75) 

No. 1-&-4 .71 (alphabets No.l-&-3 must be identical) 

No. 2-&-3 .91 

No. 2-&-4 .65 



X THE CROSS I. C. 



Suppose we have two stretches of text of which the distributions are given by 

p l» p 2> > p c and 1l> ^2> - q c 

C C 



pl = '>■ ' 4- 



If these are placed one above the other the probability of an incidence at any 
one position is 

C 



Pi *i 



and the I.C. is 



€ -- c 



P i. *i 



We can show that the expected value of £ is 1 . Suppose that we apply a permu- 
tation to the q’s and recompute £ . If we do. this for all Cl permutations and av- 
erage them we get 



E(£) = -- &2T c J l pl <*i 



Ah *11 

per mu tat ions pmiiUaiiniut 



where q l is one of the q’s, depending on the permutation. Each q comes into a given 
position (C-l) times. Thus 



q L - (c-l) ! and 



A| . 

pi r mut.it urns 

E (€) - %. JL P L q L > -- Jp l - I. 

C All U / * 

permutation*. 

Notice that this result is independent of the roughness of either distribution. 



If one of the samples of text is flat, say q( = '/c > then 

£ 



- c P i q <- 



Pi = 1,1 = 1 



If both samples are very rough, then £ fluctuates widely as the q's are permuted. 
The question arises, how wide is the distribution of Q ? 
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A standard measure is the "variance" 

_ 2 _ r~ s£‘ 2\ r r~ r s* \ 7 2 



cr 2 -- e(S p j - [Eta]: 



( 27 ) 



We evaluate E($*) - '/c> (C p; Q L ) 



( 28 ) 



where ^ is understood to mean the sum over all possible permutations of the q's. 

Then T c c 

E (£, ) -* '/c! y~ C ^ JET Pi P, Pi Pi 

T L=! J = / L 6 J J 



- * £ £ p l p i "i 



(29) 



The term S q- q ■ • ,can be evaluated in each of 

T 1 J 


We will use P -C p- e and 0 - C 

t L 


■ 


as the I.C.'s of the two samples. c 




F° r <>/ 5“ Q ■ Q. - (c-2)> > 

V L J i 4 } 


q ■ q . 

t J 


-- (c-2)! ^ q L >7 Pj~ ( c - 2 )' 


Pid-Pi) 


l = > Uj J 


1 -/ 


C 




- (c-2)!^(Pi - qf) - (c- 2) ! 


d - % ) . 



For i -j q * - (c-l)l q- t 2 =(c-t)j % . 

t (.=/ 



Thus (29) becomes 



c C 



2 C 



■ E($ a ) -- c /b> ^ >1 P, P:(c-2)!Ej£- + %,2^Pi (c-D! % 

l - / JFT / 7 = 7 



. (30) 

(31) 



-- c %> (c-2)i JET Pi Pj+ Q /C ^ p; 2 



L=! jAi 



i--/ 



= c l (c-2)l£f- <i^L + c%, (C -i),%% =(£zllJtlZ>_ +P-.F 
66 c (c- / ) c 



C S -CP-C0h P 0 
C (C-/J 



COP- OP 
C(C- IJ 



C S - CP -CO h CQ P 



C -P-0 + PO 



c (C-D 



c- / 



, (32) 
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Then since E (£ ) - / , we have 

2 _ C-P-Q + PQ 



E (£)- [E(£)J 



C- / 



- / 



C-P-Q i PQ - C + / 
C-l 



PQ -P- Q + / 

c-t 



(P-DfQ-n 

c-/ 



(33) 



This says that the square of a standard deviation 0~ is the, product of the bulges 
over c-l . 

s -.-7=== -- (C-f) V. c ~' 

\ /(p-i) (Q-n (p-i)(Q-i) 



c-l 



a measure of the significance which can be judged from table I. 



The function P can be used as 1 measure of correlation between tvo distribu- 



tions . 



XI THE COINCIDENCE TEST USED TO ALIGN 
SECONDARY ALPHABETS INTO A PRIMARY ALPHABET 

We give here a special application of the I.C. statistic. An actual problem from 
the elementary course is used, problem 1 of assignment 6. Special frequency distribu- 
tion tables were made of the sample lined up into 26 columns, i.e., lines of 26 letters 
each. 

This problem happens to be enciphered by means of a Vigenere table, the columns 
being used in successions. Consequently if the cipher text is lined up 26 wide each 
column is enciphered by a monoa 1 phabet ic substitution. Each alphabet is a slide on 
that in the next column. If »e knew the plain end cipher sequences the text could be 
decrypted. The problem is to recover these sequences. Since the sequence is not al- 
phabetical, adjacent frequency counts as given in table C appear unrelated. But if 
we look at tne rows they must be related by being slides on each other. If we can es- 
tablish those slides will have the cipher sequence. 
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By use of this special table, table "C", we can build up the cipher component 
used by matching these frequency distributions. In selecting distributions to match 
we want to obtain: 

(a) A maximum total number of letters involved (as the distribution will then 
be more reliable). 

(b) A distribution with a normal count (i.e. , -similar to normal alphabet). 

(c) A distribution without any pne letter of abnormal frequency (as this gives 
too much weight to one letter). 

When frequency distributions of two letters are properly matched, high should 
pair with high, low with low, blank with blank, etc. The mathematical value of each 
relative position is found as follows: 

(a) With the two frequency distributions in question written on paper, strips and 
slid one against the other, for any one position we cross multiply the frequencies in 
alignment, and then add the products of all these multiplications. This is the total 
number of parts or coincidences involved (see table "D"). 

(b) Cross-multiply the total count of the distribution of the first letter by the 
count of the second letter. This is the total number of possible pairs of letters. 
Chance would produce one coincidence in twenty-six. Therefore, divide this product by 
twenty-six, which gives the number of expected chance coincidences. 

(c) Divide the number of the actual coincidences by the expected number of chance 

coincidences (that is, divide (1) by (2)). The resulting number is the Index of 
Coincidence ■ , 

To prove correct alignment: 

(a) The index-for the given relative position of two distributions must be higher 
than for all other positions, with no close second. 

(b) The index should be 1.50 or higher (preferably 1.75 or higher). 

(c) There must be only one acceptable alignment. 

Indeterminate results will be encountered in some cases, particularly with in- 
sufficient text. 

Table "E" gives the total coincidences at the various positions of one strip 
slid against another. 

From table "C", it is seen that certain distributions have the following proper- 
ties, (referring to our three desired properties): 

B (cipher) has a total of 36 letters, with 14 different cells involved. Its 
highest frequency is 5. Good. Approaches normality. Maximum text. 

V (cipher) has a total of 27 letters with 15 different cell's involved. Its 
highest frequency is 4. Not good — too flat. 

K (cipher) has a total of 27 letters with 13 different cells involved. Its 
highest frequency is 4. Not good - too flat. 

G (cipher) has a total of 25 letters with 15 different cells involved. Its 
highest frequency is 5. Not good — too flat. 

D (cipher) has a total of 22 letters with only 10 different cells involved, but 
its highest frequency is 7. Not good -- too peaked. 

T (cipher) has a total of 26 letters with 11 different cells involved. Its 
highest frequency is 5. Good. Approaches normality. 

A (cipher) has a total of 25 letters' with 11 different cells involved. Its 
highest frequency is 5. Good. Approaches normality. 

N (cipher) has a total of 24 letters with 11 different cells involved. Its 
highest frequency is 4. Good. Approaches normality. 

B, T, A and N are the best choices. Match T, A and N against B, then match A 
and N against T, finally match N against A. One of these combinations should 
give a positive index of coincidence, and thus serve as a starting point. 
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Table "E" — 'fable of Total Coincidences 



B = Master Distribution 

Chance 

Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Coinci- 

Cinher-B denc es 

T -X 43 49 — 39 53 4f 40- 37' — 60 — 64 39 36 

A -X 38 66 — 59 42 50 — 42 — 35 

N -X 53 40 — — 43 39 63 — — — — — 33 

NOTE : The numbers 43, 49, etc., represent the successive 

„ "total coincidences", i.e. , the sums of products of. m 

frequencies at successive points of coincidence. 



T s Master Distribution 



Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

Cipher -T 

A -X — 32 .28 29 -- 42 31 — 51 — 43 ; — 

N -X ' 28 50 — 28 26 34 34 — — 31 32 34 



A = Master Distribution 

' 

Plain -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

Cipher-A 

N -X -- 26 28 26 -- 43 — 34 27 27 29 — 27 



T - 1 — ■ N - 5 

Plain -1234 
Cipher -T 

B -X 

A -X 



= Master Distribution 
N 

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

N / 

x — 100 ' 123 - 80 78 74 82 — 75 73 

57 56 — 61 65 — 94 — 65 



25 26 
89 96 



25 

24 



23 



100 

— 57 56 



94 



69 

48 



/ 
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Table "F " — Table of Coincidences 



Master Distribution T-l N-5 B-ll A-17 



Plain 


Jj 


21 


3 


4 


5 


6J 


7 


8 


L?J 


10 


n 


12 


13 


14 


15 


16 


17 


18 


19 


20J 


P 1 


22 


23 


24 


25 


26 


Chance 

Coin- 

ciden- 

ces 


Cipher 


B 


B 


I 


D 


N 

N 


D 


u 


F 


B 


M 


B 

B 


K 


H 


Q 


z 


E 


A 

A 


L 


G 


X 


R 


J 


W 


S 


C 


O 


K 


WM 


®K 


- 


- 


El 
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BUILDING UP THE CIPHER COMPONENT 

By utilizing the principles described in the previous sections, we can build up 
the cipher component. Take B (cipher) as the master distribution, since it is accept- 
table and contains the highest count. Copy the frequencies of' B (from "C") at the 
bottom of a strip of paper, and repeat this sequence to the right. -Over the first 
sequence write the numbers 1 to 26 as shown in table J'D". Under No. 1 write the 
letters"B''. The numbers represent the various unknown letters of the cipher component. 
Make similar master distribution strips for T and A. Next, copy the distribution of 
T (cipher) (from table "D") at the top of a strip of paper. Only one sequence is re- 
quired for this strip, and the numbers are omitte.d. Indicate the space corresponding 
to column No. 1 (table "D") by the letter. T (see table "E"). In a like manner make 
strips for A and N; ' . 

Note : The letter on each strip is an indicator to mark column No. 1 for that 

letter. When strips are properly 'aligned, the indicators show the relative 
positions of these letters in the cipher component. The student is advised 
to prepare strips for himself and follow these processes. 

First, match T against B. As no two letters can occupy the same position 
in the cipher component, begin by setting the T indicator at No. 2 on the B master 
alphabet. -Note the coincidences. Next, slide T to No. 3, and note the coincidences. 
Continue this process to No. 26, and record the succesive coincidences in tabular form 
(see table "F"). In many cases lack of good coincidences will be obvious by inspection 
tion and the count need not be made. In this way we discover that B and T give high 
indices' of coincidence in two different alignments (indices computed in accordance 
with^rule in page 23). 

Index of B (1) - T (7) fil - 1.77 (good) 

36 

Index of B (1) - T (11) 6g = 1.67 (good) 

36 

All other alignments give such low indices that they can be at once eliminated. The 
above two Indices, however, are both high enough to be significant, and as the second 
is so close to the first, it cannot be disregarded. 

There can be only one acceptable point of coincidence; therefore, it is necessary to 
match A against B, and- N against B, to see if more conclusive results can be attained. 

Index of B (1) - A (5) £fi =• 1.89 (excellent) 

36 

Index of B (1) - A (7) 52 = 1.69 (good) 

' 35 

(other alignments are eliminated) 

Index of B (1) - N (21) 52 - 1.91 (excellent)' 

33 

Index. of B (1) - N (6) 52 - 1.60 (fair) 

33 

(other alignments are eliminated) 



- 22 - 
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Since there is no outstanding coincidence with ''B" as the "master-alphabet", tr 



and " A 


as 


the 


"master-alphabet 


*1 ■ 




Index 


of 


T 


(1) 


- A 


(17) 


31 




2.04 


(excellent ) 














25 








Index 


of 


T 


(1) 


- A 


(19) 


43 


- 


1.72 


(good) 














- 25 








.Index 


of 


T 


(1) 


- A 


(ID 


42 


_ 


1.68 


(good) 














25 








Index 


of 


T 


(1) 


- N 


(5) 


,30 


= 


2.08 


(excellent ) 














24 








index 


of 


T 


(1) 


- N 


(12) 


34 


_ 


1.42 


(poor) 












(16) 


24 


















(26) 










Index 


of 


A 


(1) 


- N 


(15) 


43 


_ 


1.87 


(good) 














23 








Index 


of 


A 


(1) 


- N 


(17) 


34 




1.48 


(poor) 














23 


4 







The most certain combination is T-(l) - N (5), and there is no doubt as to its 
correctness. This located "N" relative to "T" in the cipher component, and allows 
us to consolidate their frequencies. 



For a new master distribution add the frequencies of T (at space No. 1) to those 



of N (at space 
distribution : 


No. 5} 


(see table ”E" 


). Match "B* 


' and "A" against this new 


master 


Index of T 


(1) - N 


(5) - B 


(11) 


123 = 
69 


1.81 


(good) 




Index of T 


(1) - N 


(5) - B 


(7) 


100 = 

69 


1.45 


(poor) 




Index of T 


(1) - N 


(5) - B 


( 26 ) 

% 


ik * 

69 


1.40 


(poor) 




Index of T 


(1) - N 


(5) - A 


(17) 


2A = 
48 


1.96 


(excellent) 




Index of T 


(1) - N 


(5) - A 


(15) 


£5 ■- 
48 


1.35 


(very poor) 




Index of T 


(1) - N 


(5) - A 


(19) 


£3 = 
48 


1.35 


(very poor) ' 




"B" and "A" can now be 


consolidated 


with 


T" and 


"N" for the final "master 


distri 



bution" as follows: 



.•Plain - 1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


:Cipher - T 








N 


















:T & 


4 


3 




1 


1 












7 


6 


: B 


2 


3 


1 






1 




1 


2 




1 


3 


: A 


3 


2 














1 




2 






Plain - 14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24- 


25 


26: 


Cipher - 


























T & 1 






7 


3 


1 


5 






2 


5 




4 : 


B 1 




4 


4 


1 










'2 


5 




5 : 


A 1 


2 




1 


3 




3 








2 




5: 
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This locates "B" and "A" relative to "T" and "N" in the cipher component, in addition 
to giving the combined fiequencies of all four letters. 



FINAL MASTER DISTRIBUTION 



:Plain * 
: Comp . 


1 


2 


3 


4 


5 


6 


7 8 


9 


10 


11 


12 


13 


:Cipher - 
:Comp. 


T 








N 










B 






: Comb . 
:Freq. 




9 


8 


1 


1 


1 


1 


1 


3 




10 


9 




Plain 
Comp . 


14 


15 


16 


17 


18 


19 


20 21 


22 


23 


24 


25 


26: 


Cipher - 
Comp . 








A 


















Comb . 
Freq. 


3 


2 


4 


12 


7 


1 


8 


4 


12 






14: 



Analyze the preceding steps. T gave two possible alignments with B, and, as we 
now see, the incorrect position gave the higher index. N also gave two possible 
alignments with B. (B was at fault due to its erratic letter distribution). However 
when T and N are combined, giving twice as many letters in the master distribution, 

B fitted in with only one possible alignment. Adding B and A gives twice as many 
letters in the master distribution and this should make future results even more posi 
tive. The master distribution (of 100 letters or more) should approximate a normal 
frequency distribution and will give a standard to which all the other distributions 
can be referred. Hereafter, variations in the highest iqdex of coincidence_ will be 
due entirely to letter distribution of the various distributions themselves. 

For example: 

If the highest index, is 1.7 — letter distribution is normal. 

'if the highest index is 2.0 — high frequency letters predominate. 

* 

If the highest index is 1.4 — the intermediate and low frequency letters 
predominate . 

Therefore, when matching the remaining letters, we can accept the highest index of 
coincidence as establishing coincidence, unless the second highest is practically the 
same . 

Continue the matching process and the reconstruction of the cipher component, noting 
that T, N , ' B and A are already located and thus may be deleted at once from further 
test. Begin with the letters of the highest frequency, as they should give the most 
positive results. When a letter is placed, delete this location from further test, as 
two letters connot occupy the same space in the cipher component. 

Letters are added to the cipher component in the following order (see table "F") : 



"Master 


alphabet" T (1) - 


N (5) - B (11) 


K 


(12) 


1.46 


(poor - but 


acceptable) 


V 


(2) 


1.51 


(poor - but 


acceptable ) 


D 


(4) 


2.05 


(excellent ) 




D 


(8) 


1.76 


(goo.d) (D - 


not certain) 


E 


(16) 


1.86 


(excellent ) 




W 


(23) 


1.96 


(excellent ) 




G 


(19) 


1.80 


(good) 





Note : With', this many values a key-word (if any) sequence could be completed by 

inspection. In this case, the partially reconstructed cipher component 
gives no suggestion of a key-word sequence. 
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R 


(21) 


1.55 


(fair but acceptable) 








M 


(10) 


1.65 


(good) 








M 


(14) 


1.53 


(M - not certain) 








J 


(22) 


1.89 


(excellent) 




1 




L 


(18) 


1.71 


(good) 








L 


(6) 


1.60 


(fair)(L - not certain) 








S 


(24) 


1.77 


(good) 






- 


C 


(25) 


2.12 


(excellent) ' 








U 


(7) 


1.47 


(poor but acceptable) 








Y 


(6) 


1.89 


(excellent ) 








Note : 


This throws out L (6) , but leaves 


L 


(18) 


as correct. 


I 


(3) 




(fair) 








I 


(13) 


1.42 


(poor) (I - not certain) 








H 


(131 


■1.63 


(good) 








Note : 


This throws out I (13) and leaves 


I 


(3) as correct. 


P 


(9) 


1.95 


(excellent) 








X 


(20) 


1.90 


(excellent) 








0 


(26) 


l.,72 


(good) 








Q 


(14) 


1.59 


(fair and acceptable) 








Note : 


This 


throws out M (14) and leaves 


H 


(10) 


as correct. 


Z 


(15) 


1.77 


(good) 








F 


(8) 


1.45 


(poor but acceptable) 








F 


(4) 


1.20 


(very poor) 









F (8) is correct and D (4) is correct 

The cipher component has now been completely recovered. 

Note: The process described above has actually built up the complete squared- 

cipher-table of a modified Vigenere table (it remains only to recover the 
plain component to complete the Vigenere table). We have written down 
the cipher component rather than the complete-squared-table merely to save 
time and effort. _ 



XH THE ROUGHNESS OF MIXED TEXTS 



What happens when two different distributions are mixed? -As a simple case, let 
us suppose we mix some text of I.C. 7 with flat text in the proportions R:(l - R) . 

Then the ith letter has probability, 

p.R + (1 - R)1 

e _ c 

and the I . C. is C ( p t R + ( t ~ R ) -£- )* 

•C± p L Z R* + 2c R (f-R)-jr + C^t(l-R / ^ 

/ / / 

' (34) 

= R 7 -h 2 R ( !-R) -h (!~R) 



= R s 7 — R 2 +1 = 1+ (7- I) R e . 
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That is, the rough text contributes its bulge in the proportion R2. 

Now as a more complicated case suppose that we mix two rough texts in the 

proportions R: (1 - R). Suppose further that the distributions are Pj, P 2 , , P c 

and q^ , qg, 1 q c and 



Pi 2 - P , 



<*L 



i . 



Then the mixture will have as probability of the ith letter 



and y - C 



( Pt,P + ql d-P))' 



Pi P + Qi (!-R) 



-cZIp* r * +2 c IEpl q i R (/-Pt+c^Iq* 



-- PR* + 2R(i-R) p L q L + Q(!-R)* . 



( 35 ) 



The expression C Pi qi - £ we have examined before, and seen to be a 

measure of the correlation of the distributions. Since the expected value of £ is 1, 
the expected value of y will be 

E ( y)= PR*+ 2R (l-R) + Q(! -R )* 

-- R* + (P - / ; R* + 2R~ 2R* + (t~R)*+ (Q-t) ( t - R )* . 

= ! + (P-t)R* + (Q-U(t-R)*. 

Here again each contributes its bulge in proportion to the square of ite weight. 

Now that we have seen how the argument goes, we can generalize this result to 
a mixture of K samples with distributions Pjj_, P^2» > Pic > 



p • 

7 -/ 1 J 



and C p ; . * the l.C. of the ith^sample, 

which we will suppose present in the proportion RJ^ 



the ith sample, 

» n 

•i, ^ *i 



The mixture will have the jth letter present in the proportion 

K 



n 



so that its l.C. is 
c 

y -- c 



- c 






l J 



. j-- / 


P; c 


' ih Ri • 


c ’ 


K K 




-z: 


y~ Ri 


Pij R l PlJ 


j‘i 


i -/ l=! 




K 


± R i 


c P\j 


A -/ 


t- / 


j--! 



<■ J 



-- / 



( 36 ) 
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The internal sum is the measure of the correlation of the ith and i th samples, 
which we will designate by 

£* y &IL c ■_ / P[j P L j . 

Notice that / r . / [ is the I. C. of the i'th sample. 



Now y becomes y = Ri R , ft: L 

i--l l=I 1 lL 

K K K 

-X ^ R i R l $u + * n 

L-l LtL ' l=t 



--JET Ri r L tu +JE [ R ‘+ (?i - W 

i - / Ltl 1= / 



Since f ^ r / 



we have 



ZT ^ * JET ( - n R i e 

i-t L-t . i=t 



L= 7 i--/ 



Rl + 



( 7t-n r l 



£ (7) -- / -t (7 - I ) R i 

i-t L 



Again each sample contributes its bulge according to the square of its presence. 



